VIBRANT Trial - Primary Analyses

Author

Laura Symul (UCLouvain), Laura Vermeren (UCLouvain), Susan Holmes (Stanford University)

Published

June 30, 2025

Code
library(tidyverse)
library(magrittr)
library(gt)
library(patchwork)
library(SummarizedExperiment)
library(tidySummarizedExperiment)
library(brms)
library(gtsummary)
library(labelled)
library(scales)
# library(gridExtra)
library(ggpubr)
# library(mia) # BiocManager::install("mia")

theme_set(theme_light())
tmp <- fs::dir_map("R", source)
rm(tmp)
Code
tmp <- fs::dir_map("../VIBRANT-99-utils/R/", source)
rm(tmp)

This document details the analyses and provides results for the primary analyses of the VIBRANT trial.

Loading the data

The data used for these analyses are stored in an R Bioconductor MultiAssayExperiment object which contains the QCed and pre-processed data. The code and description of how raw data have been processed and QCed is available on the VIBRANT-0-QC-and-preprocessing GitHub repository (Still private at the moment, but will be made public upon publication, and anyone wishing to have access can email Laura Symul).

Code
mae_file <- 
  fs::dir_ls(str_c(data_dir(), "04 unblinded MAEs/"), regexp = "mae_full_.*\\.rds$") |> 
  sort(decreasing = TRUE) |>
  magrittr::extract(1)

cat(str_c("Loading the RDS file containing the unblinded VIBRANT MAE: /", mae_file |> str_remove(data_dir()), "\n"))
Loading the RDS file containing the unblinded VIBRANT MAE: /04 unblinded MAEs/mae_full_20250624.rds
Code
mae <- readRDS(mae_file)

rm(mae_file)

Demographics (Table 1)

Summary table

The demographics table is built as follow.

We first create a table1_data data.frame from the participant_crfs_merged table stored in the @metadata slot of the mae.

This table contains almost all variables needed to create the demographics table (i.e., site, age, race, ethn, cut_size_meals, eat_less, hungry_did_not_eat, sexual_partners_lifetime, sexual_partners_past_month, contraception_method).

In addition, the mae@colData contains the participants’ arm and study population flags (here we keep the mITT population).

From the variables listed above, we create the variable food_insecurity using the three variables cut_size_meals, eat_less and hungry_did_not_eat: if a participant answered “Yes” to any of the corresponding questions, then she is considered to be affected by food insecurity.

Code
table1_data <- 
  metadata(mae)$participant_crfs_merged |> 
  dplyr::left_join(
    colData(mae) |> as_tibble() |> select(pid, site, arm, ITT, mITT, PP) |> distinct(), 
    by = join_by(pid, site)
    )  |> 
  filter(mITT) |> 
  select(pid, site, arm, age, race, ethn, cut_size_meals, eat_less, hungry_did_not_eat, sexual_partners_lifetime, sexual_partners_past_month, contraception_method) |> 
  mutate(
    food_insecurity = ifelse(
      cut_size_meals == "Yes" | eat_less == "Yes" | hungry_did_not_eat == "Yes",
      "Yes",
      "No"
    ),
    food_insecurity = factor(food_insecurity, levels = c("No", "Yes"))
  ) |> 
  select(-c(eat_less, cut_size_meals, hungry_did_not_eat))
Code
table1_data |> 
  ggplot(aes(x = food_insecurity)) +
  geom_bar() +
  labs(x= "Food insecurity", y = "Count")+
  theme_classic()

In addition to these variables, we also include the participants’ partner’s gender.

Participants are asked about their partner’s gender at all weekly visits (1000, 1100, 1200, 1300, 1400, 1500, 1700, 1900, 2120) using CRF 19.

Answers to that question can be found in the past_week_sex_partner column of the mae@metadata$visits_crfs_merged table. Participants’ answer throughout the weekly visits will be summarized into one of these 4 categories:

  • No sex
  • Only male
  • Only female
  • Both

This variable is added to the table1_data table.

Code
gender_sexual_partner <- 
  metadata(mae)$visits_crfs_merged |>
  select(pid, past_week_sex_partner) |>
  filter(!is.na(past_week_sex_partner)) |>
  group_by(pid) |>
  summarise(
    gender_sexual_partner = 
      case_when(
        all(past_week_sex_partner == "No sex in the past week") ~ "No sex",
        any(past_week_sex_partner %in% c("A single male partner", "Multiple male partners")) &
          !any(past_week_sex_partner %in% c("A single female partner", "Multiple female partners")) ~ "Only man",
        any(past_week_sex_partner %in% c("A single female partner", "Multiple female partners")) &
          !any(past_week_sex_partner %in% c("A single male partner", "Multiple male partners")) ~ "Only female",
        any(past_week_sex_partner %in% c("A single male partner", "Multiple male partners")) &
          any(past_week_sex_partner %in% c("A single female partner", "Multiple female partners")) ~ "Both",
        TRUE ~ NA_character_
      ),
    .groups = "drop"
  )

table1_data <- 
  table1_data |> 
  left_join(gender_sexual_partner, by = "pid")

We also re-code the race variable as follows:

  • “Asian or South Asian” is re-coded to “Asian” (for brevity)
  • “Black, African American, or African” and “African” are re-coded to “Black/African” (for brevity)
  • “Coloured”, “White”, “Other”, and “Prefered not to answer” are left as is.
Note

@Disebo & Caroline: do we go for British or American English? (Colored vs Coloured)?

Code
table1_data <-
  table1_data |>
  
  mutate(
    across(where(is.character), as.factor),
    
    race = fct_recode(
      race,
      "Asian" = "Asian or South Asian",
      "Black/African" = "Black, African American, or African",
      "Black/African" = "African",
      "Coloured" = "Coloured",
      "White" = "White",
      "Other" = "Other",
      "Prefer not to answer" = "Prefer not to answer"
    ),
    
    race = fct_relevel(race, "Asian", "Black", "Coloured", "White", "Other", "Prefer not to answer"),
    arm = arm |> fct_drop()
  ) |>
  
  set_variable_labels(
    food_insecurity = "Report food insecurity in past 12 months ",
    sexual_partners_past_month = "Number of partners in past month",
    sexual_partners_lifetime = "Number of partners in lifetime",
    ethn = "Ethnicity", 
    contraception_method = "Contraception", 
    gender_sexual_partner = "Gender of sexual partners"
  )
Warning

@Laura Vermeren: I changed the summary statistics from mean to median for the continuous variables (for number of past sexual partner, the mean was artificially high in the arm with the 90 partner MGH participant). Would you be able to check that the tests selected by the gt_summary package are appropriate for the data we have? They should be, but I’d love if we could double-check :)

Table 1 (Population characteristics by arm)

Code
table1_data |> 
  select(-pid) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  )  |> 
  add_p(include = -Site)
Characteristic Pl
N = 191
LC-106-7
N = 211
LC-106-3
N = 151
LC-106-o
N = 151
LC-115
N = 201
p-value2
Site





    CAP 14 (74%) 14 (67%) 14 (93%) 14 (93%) 14 (70%)
    MGH 5 (26%) 7 (33%) 1 (6.7%) 1 (6.7%) 6 (30%)
Age 29 (20-38) 29 (18-40) 26 (21-34) 25 (19-35) 30 (20-40) 0.5
Race




0.5
    Asian 1 (5.3%) 0 (0%) 1 (6.7%) 0 (0%) 0 (0%)
    Coloured 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    White 2 (11%) 3 (14%) 0 (0%) 0 (0%) 1 (5.0%)
    Other 1 (5.3%) 0 (0%) 0 (0%) 0 (0%) 1 (5.0%)
    Prefer not to answer 1 (5.3%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Black/African 14 (74%) 18 (86%) 14 (93%) 15 (100%) 18 (90%)
Ethnicity




0.8
    Not hispanic/latino/spanish 3 (60%) 6 (86%) 1 (100%) 1 (100%) 4 (67%)
    hispanic/latino/spanish 2 (40%) 1 (14%) 0 (0%) 0 (0%) 2 (33%)
    Unknown 14 14 14 14 14
Number of partners in lifetime 5 (1-10) 5 (1-90) 4 (1-10) 4 (1-20) 5 (1-10) 0.8
Number of partners in past month 1 (1-2) 1 (0-2) 1 (0-2) 1 (0-3) 1 (0-2) 0.4
Contraception




>0.9
    Continuous combined oral contraceptive pills (skip placebo) 3 (16%) 5 (24%) 1 (6.7%) 1 (6.7%) 3 (15%)
    Contraceptive vaginal ring 0 (0%) 1 (4.8%) 0 (0%) 0 (0%) 0 (0%)
    Cyclic combined oral contraceptive pills 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (5.0%)
    Depo provera 11 (58%) 10 (48%) 12 (80%) 11 (73%) 11 (55%)
    Hormonal implant 1 (5.3%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Hormone containing IUD 2 (11%) 2 (9.5%) 0 (0%) 0 (0%) 2 (10%)
    Net-EN 2 (11%) 2 (9.5%) 2 (13%) 3 (20%) 2 (10%)
    Other 0 (0%) 1 (4.8%) 0 (0%) 0 (0%) 1 (5.0%)
Report food insecurity in past 12 months 7 (37%) 7 (33%) 6 (40%) 5 (33%) 6 (30%) >0.9
Gender of sexual partners




0.8
    No sex 1 (5.3%) 3 (14%) 1 (6.7%) 1 (6.7%) 3 (15%)
    Only man 18 (95%) 18 (86%) 14 (93%) 14 (93%) 17 (85%)
1 n (%); Median (Min-Max)
2 Kruskal-Wallis rank sum test; Fisher’s exact test; Pearson’s Chi-squared test

Study population characteristics appear balanced across arms.

Population characteristics by site

Code
table1_data |> 
  select(arm, age, race, site, food_insecurity, contraception_method, sexual_partners_past_month, sexual_partners_lifetime) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Site,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  )  |> 
  add_p()
Characteristic CAP
N = 701
MGH
N = 201
p-value2
Arm

0.2
    Pl 14 (20%) 5 (25%)
    LC-106-7 14 (20%) 7 (35%)
    LC-106-3 14 (20%) 1 (5.0%)
    LC-106-o 14 (20%) 1 (5.0%)
    LC-115 14 (20%) 6 (30%)
Age 26 (18-37) 33 (22-40) <0.001
Race

<0.001
    Asian 0 (0%) 2 (10%)
    Coloured 0 (0%) 0 (0%)
    White 0 (0%) 6 (30%)
    Other 0 (0%) 2 (10%)
    Prefer not to answer 0 (0%) 1 (5.0%)
    Black/African 70 (100%) 9 (45%)
Report food insecurity in past 12 months 28 (40%) 3 (15%) 0.038
Contraception

<0.001
    Continuous combined oral contraceptive pills (skip placebo) 5 (7.1%) 8 (40%)
    Contraceptive vaginal ring 0 (0%) 1 (5.0%)
    Cyclic combined oral contraceptive pills 0 (0%) 1 (5.0%)
    Depo provera 54 (77%) 1 (5.0%)
    Hormonal implant 0 (0%) 1 (5.0%)
    Hormone containing IUD 0 (0%) 6 (30%)
    Net-EN 11 (16%) 0 (0%)
    Other 0 (0%) 2 (10%)
Number of partners in past month 1 (0-3) 1 (0-2) 0.4
Number of partners in lifetime 4 (1-20) 10 (4-90) <0.001
1 n (%); Median (Min-Max)
2 Fisher’s exact test; Wilcoxon rank sum test; Pearson’s Chi-squared test

There are, unsurprisingly, striking differences in study population characteristics by site.

Population characteristics by arm, split by site

Code
table1_data |> 
  filter(site == "CAP") |> 
  select(-c(pid, site)) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) |> 
  add_p()|>
  modify_caption("**Site = CAP**")
Site = CAP
Characteristic Pl
N = 141
LC-106-7
N = 141
LC-106-3
N = 141
LC-106-o
N = 141
LC-115
N = 141
p-value2
Age 26 (20-36) 26 (18-36) 26 (21-34) 25 (19-35) 25 (20-37) >0.9
Race




>0.9
    Asian 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Coloured 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    White 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Other 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Prefer not to answer 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Black/African 14 (100%) 14 (100%) 14 (100%) 14 (100%) 14 (100%)
Ethnicity





    Not hispanic/latino/spanish 0 (NA%) 0 (NA%) 0 (NA%) 0 (NA%) 0 (NA%)
    hispanic/latino/spanish 0 (NA%) 0 (NA%) 0 (NA%) 0 (NA%) 0 (NA%)
    Unknown 14 14 14 14 14
Number of partners in lifetime 4 (1-10) 3 (1-7) 4 (1-10) 4 (1-20) 4 (1-9) 0.8
Number of partners in past month 1 (1-2) 1 (1-2) 1 (0-2) 1 (0-3) 1 (1-2) 0.2
Contraception




0.9
    Continuous combined oral contraceptive pills (skip placebo) 1 (7.1%) 2 (14%) 0 (0%) 0 (0%) 2 (14%)
    Contraceptive vaginal ring 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Cyclic combined oral contraceptive pills 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Depo provera 11 (79%) 10 (71%) 12 (86%) 11 (79%) 10 (71%)
    Hormonal implant 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Hormone containing IUD 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Net-EN 2 (14%) 2 (14%) 2 (14%) 3 (21%) 2 (14%)
    Other 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Report food insecurity in past 12 months 7 (50%) 5 (36%) 6 (43%) 5 (36%) 5 (36%) >0.9
Gender of sexual partners




>0.9
    No sex 1 (7.1%) 2 (14%) 1 (7.1%) 1 (7.1%) 2 (14%)
    Only man 13 (93%) 12 (86%) 13 (93%) 13 (93%) 12 (86%)
1 Median (Min-Max); n (%)
2 Kruskal-Wallis rank sum test; Fisher’s exact test; Pearson’s Chi-squared test
Code
table1_data |> 
  filter(site == "MGH") |> 
  select(-c(pid, site)) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) |> 
  add_p()|>
  modify_caption("**Site = MGH**")
Site = MGH
Characteristic Pl
N = 51
LC-106-7
N = 71
LC-106-3
N = 11
LC-106-o
N = 11
LC-115
N = 61
p-value2
Age 32 (26-38) 33 (24-40) 25 (25-25) 35 (35-35) 33 (22-40) 0.7
Race




0.13
    Asian 1 (20%) 0 (0%) 1 (100%) 0 (0%) 0 (0%)
    Coloured 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    White 2 (40%) 3 (43%) 0 (0%) 0 (0%) 1 (17%)
    Other 1 (20%) 0 (0%) 0 (0%) 0 (0%) 1 (17%)
    Prefer not to answer 1 (20%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Black/African 0 (0%) 4 (57%) 0 (0%) 1 (100%) 4 (67%)
Ethnicity




0.8
    Not hispanic/latino/spanish 3 (60%) 6 (86%) 1 (100%) 1 (100%) 4 (67%)
    hispanic/latino/spanish 2 (40%) 1 (14%) 0 (0%) 0 (0%) 2 (33%)
Number of partners in lifetime 9 (7-10) 15 (6-90) 7 (7-7) 10 (10-10) 6 (4-10) 0.059
Number of partners in past month 1 (1-2) 1 (0-2) 1 (1-1) 1 (1-1) 1 (0-2) >0.9
Contraception




>0.9
    Continuous combined oral contraceptive pills (skip placebo) 2 (40%) 3 (43%) 1 (100%) 1 (100%) 1 (17%)
    Contraceptive vaginal ring 0 (0%) 1 (14%) 0 (0%) 0 (0%) 0 (0%)
    Cyclic combined oral contraceptive pills 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (17%)
    Depo provera 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (17%)
    Hormonal implant 1 (20%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Hormone containing IUD 2 (40%) 2 (29%) 0 (0%) 0 (0%) 2 (33%)
    Net-EN 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Other 0 (0%) 1 (14%) 0 (0%) 0 (0%) 1 (17%)
Report food insecurity in past 12 months 0 (0%) 2 (29%) 0 (0%) 0 (0%) 1 (17%) 0.8
Gender of sexual partners




>0.9
    No sex 0 (0%) 1 (14%) 0 (0%) 0 (0%) 1 (17%)
    Only man 5 (100%) 6 (86%) 1 (100%) 1 (100%) 5 (83%)
1 Median (Min-Max); n (%)
2 Kruskal-Wallis rank sum test; Fisher’s exact test

Study population characteristics appear balanced across arms at the CAPRISA site. Given the low number of participants at the MGH site in two of the five arms, we create an additional table excluding these two arms.

Code
table1_data |> 
  filter(site == "MGH", !(arm %in% c("LC-106-o", "LC-106-3"))) |> 
  mutate(arm = arm |> fct_drop()) |> 
  select(-c(pid, site)) |>
  rename_with(~ str_to_title(.x)) |>  
  tbl_summary(
    by = Arm,
    type =
      list(
        Sexual_partners_past_month ~ "continuous",
        Sexual_partners_lifetime ~ "continuous"
      ),
    statistic = list(
      all_continuous() ~ "{median} ({min}-{max})",
      all_categorical() ~ "{n} ({p}%)"
    )
  ) |> 
  add_p()|>
  modify_caption("**Site = MGH**")
Site = MGH
Characteristic Pl
N = 51
LC-106-7
N = 71
LC-115
N = 61
p-value2
Age 32 (26-38) 33 (24-40) 33 (22-40) >0.9
Race


0.14
    Asian 1 (20%) 0 (0%) 0 (0%)
    Coloured 0 (0%) 0 (0%) 0 (0%)
    White 2 (40%) 3 (43%) 1 (17%)
    Other 1 (20%) 0 (0%) 1 (17%)
    Prefer not to answer 1 (20%) 0 (0%) 0 (0%)
    Black/African 0 (0%) 4 (57%) 4 (67%)
Ethnicity


0.7
    Not hispanic/latino/spanish 3 (60%) 6 (86%) 4 (67%)
    hispanic/latino/spanish 2 (40%) 1 (14%) 2 (33%)
Number of partners in lifetime 9 (7-10) 15 (6-90) 6 (4-10) 0.017
Number of partners in past month 1 (1-2) 1 (0-2) 1 (0-2) 0.8
Contraception


>0.9
    Continuous combined oral contraceptive pills (skip placebo) 2 (40%) 3 (43%) 1 (17%)
    Contraceptive vaginal ring 0 (0%) 1 (14%) 0 (0%)
    Cyclic combined oral contraceptive pills 0 (0%) 0 (0%) 1 (17%)
    Depo provera 0 (0%) 0 (0%) 1 (17%)
    Hormonal implant 1 (20%) 0 (0%) 0 (0%)
    Hormone containing IUD 2 (40%) 2 (29%) 2 (33%)
    Net-EN 0 (0%) 0 (0%) 0 (0%)
    Other 0 (0%) 1 (14%) 1 (17%)
Report food insecurity in past 12 months 0 (0%) 2 (29%) 1 (17%) 0.7
Gender of sexual partners


>0.9
    No sex 0 (0%) 1 (14%) 1 (17%)
    Only man 5 (100%) 6 (86%) 5 (83%)
1 Median (Min-Max); n (%)
2 Kruskal-Wallis rank sum test; Fisher’s exact test

Besides the number of lifetime sexual partners, arms appear balanced across arm at MGH.

Primary outcomes (Table 2)

Colonization by week 5

The primary outcome is defined as “Colonization by any of the LBP strains after product administration by week 5”, and colonization by any of the LBP strain is defined as a relative abundance of 5% by any single strain or 10% total relative abundance of the LBP strains. The relative abundance of LBP strain is estimated using metagenomics (taxonomic composition estimated using VIRGO2 and strain proportion of total L. crispatus is estimated using kSanity), and includes samples from weekly visits 1200 to 1500 (week 2 (post-INT) to week 5 (3 weeks post-INT)).

The computation itself has been done at the QC/Pre-processing stage (see the VIBRANT-0-QC-and-preprocessing GitHub repository).

Code
col_by_week5 <- 
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_mg")
  ) |> 
  mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) 
  

# by qPCR 

col_by_week5_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "colonized_LBP_by_qpcr")
  ) |> 
  mutate(LBP_colonization_by_week5 = outcome |> replace_na(FALSE)) 


detected_by_week5_qPCR <-
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_LBP_qPCR"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "LBP_detected_by_qpcr")
  ) |> 
  mutate(LBP_detected_by_week5 = outcome |> replace_na(FALSE)) 


# crispatus dominance 

crisp_dom_by_5week <- 
  mae@colData |> as_tibble() |> 
  select(pid, site, randomized, arm, ITT, mITT, PP) |> 
  distinct() |> 
  filter(randomized, mITT) |> 
  left_join(
    mae[["col_crispatus_mg"]] |> 
      as_tibble() |> 
      dplyr::left_join(
        mae@colData |> as_tibble() |> select(uid, pid, visit_code),
        by = join_by(.sample == uid)
      ) |> 
      dplyr::filter(visit_code == "1500", .feature == "crispatus_dominance_by_mg")
  ) |> 
  mutate(crisp_dom_by_5week = outcome |> replace_na(FALSE)) 
Code
summary_table <- 
  col_by_week5 |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(LBP_colonization_by_week5),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "exact")
  )
Code
table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )
Code
table_2 |> 
  group_by(site) |> 
  gt(caption = "Table 1: Colonization by week 5 by arm and site", row_group_as_column = TRUE) |> 
  cols_width(
    "name" ~ px(200),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    # Blinded = "All blinded arms",
    Pl = "Placebo",
    `LC-106-7` = "LC-106<br>7 days",
    `LC-106-3` = "LC-106<br>3 days",
    `LC-106-o` = "LC-106<br>overlap",
    `LC-115` = "LC-115<br>7 days",
    .fn = md
  )
Table 1: Colonization by week 5 by arm and site
Placebo LC-106
7 days
LC-106
3 days
LC-106
overlap
LC-115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants with LBP strain detected 0 (0 %) 9 (64 %) 7 (50 %) 7 (50 %) 10 (71 %)
95% CI 0% - 23% 35% - 87% 23% - 77% 23% - 77% 42% - 92%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants with LBP strain detected 1 (20 %) 6 (86 %) 1 (100 %) 1 (100 %) 6 (100 %)
95% CI 1% - 72% 42% - 100% 3% - 100% 3% - 100% 54% - 100%

Visualization of the primary outcome (and a few selected secondary outcomes)

Code
col_by_week5 |> 
  arrange(site, arm, LBP_colonization_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_colonization_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  theme(
    legend.position = "top",
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  ) 

Code
col_by_week5_qPCR |> 
  arrange(site, arm, LBP_colonization_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_colonization_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  ggtitle("Colonization based on the relative abundances estimated by qPCR") +
  theme(
    legend.position = "top",
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), 
    plot.title = element_text(hjust = 0.5)
  ) 

Code
detected_by_week5_qPCR |> 
  arrange(site, arm, LBP_detected_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_detected_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  ggtitle("Colonization based on qPCR-based detection of at least 2 LBP strains detected at the same visit by week 5") +
  theme(
    legend.position = "top",
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), 
    plot.title = element_text(hjust = 0.5)
  ) 

Code
crisp_dom_by_5week |> 
  arrange(site, arm, crisp_dom_by_5week) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = crisp_dom_by_5week)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  ggtitle("L. crispatus dominance (≥ 50% by metagenomics)") +
  theme(
    legend.position = "top",
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), 
    plot.title = element_text(hjust = 0.5)
  ) 

Tests

We test for differences between the placebo arm and each active arm for the primary outcome using Fisher’s exact test (exhaustive permutation test).

Data from both site are merged for each arm given the low numbers of participants at MGH.

The \(p\)-values are adjusted for multiple testing (multiple active arms against the Placebo) using the Benjamini-Hochberg method.

Code
arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

fisher_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- col_by_week5 |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$LBP_colonization_by_week5, tmp$arm |> fct_drop())
    test_res <- fisher.test(table_for_test, alternative = "greater")
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      OR = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )
Code
test_results <- 
  fisher_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    OR = OR |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, OR, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "OR" ~ "OR",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "OR" ~ "",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(caption = "Table 2: Colonization by week 5 by arm and site", row_group_as_column = TRUE) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC-106-7` = "LC-106<br>7 days",
    `LC-106-3` = "LC-106<br>3 days",
    `LC-106-o` = "LC-106<br>overlap",
    `LC-115` = "LC-115<br>7 days",
    .fn = md
  )
Table 2: Colonization by week 5 by arm and site
Placebo LC-106
7 days
LC-106
3 days
LC-106
overlap
LC-115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants with LBP strain detected 0 (0 %) 9 (64 %) 7 (50 %) 7 (50 %) 10 (71 %)
95% CI 35% - 87% 23% - 77% 23% - 77% 42% - 92%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants with LBP strain detected 1 (20 %) 6 (86 %) 1 (100 %) 1 (100 %) 6 (100 %)
95% CI 42% - 100% 3% - 100% 3% - 100% 54% - 100%
Adj. p-value Ref. <0.001 0.002 0.002 <0.001
OR 39.76 18.61 18.61 60.46
95% CI 5.69 - Inf 2.48 - Inf 2.48 - Inf 8.17 - Inf

Model

[This section still needs more work given the low sample size in some of the arms. Code is not executed].

Code
col_by_week5_agg <- 
  col_by_week5 |>
  group_by(site, arm) |>
  summarise(success = LBP_colonization_by_week5 |> sum(), ntrials = n()) 

bfit <- 
  brm(
    success | trials(ntrials) ~ arm + site + site:arm, 
    data = col_by_week5_agg, 
    family = binomial("logit")
    )
Code
summary(bfit)
plot(bfit, ask = FALSE)
bind_cols(col_by_week5_agg, predict(bfit))
# loo(bfit, moment_match = TRUE, reloo = TRUE)
# pp_check(bfit)
Code
plot_res <- plot(conditional_effects(bfit), ask = FALSE)
Code
plot_res <- lapply(plot_res, function(p) {
  p + scale_color_manual(values = site_colors)
})

Reduce(`+`, plot_res) + 
  plot_layout(widths = c(2.2, 1, 3))
Code
model_results <- summary(bfit)$fixed
model_results <- 
  model_results |> 
  rownames_to_column("term") |> 
  as_tibble() |> 
  mutate(
    factor = 
      case_when(
        str_detect(term, ":site") ~ "Arm x Site",
        str_detect(term, "^site") ~ "Site",
        str_detect(term, "^arm") ~ "Arm",
        TRUE ~ "Ref.",
      ),
    factor_level = 
      term |> 
      str_remove("arm") |> str_remove("site") |> str_replace(":"," x ") |> 
      str_replace("6M", "6-") |> str_replace("CM", "C-")
  ) |> 
  mutate(
    OR = ifelse(factor == "Ref.", NA, exp(Estimate)),
    lower_CI = ifelse(factor == "Ref.", NA, exp(`l-95% CI`)),
    upper_CI = ifelse(factor == "Ref.", NA, exp(`u-95% CI`))
  ) 
Code
model_results |> 
  select(factor, factor_level, OR, lower_CI, upper_CI) |>
  gt() |> 
  gt::fmt_number(
    decimals = 2, n_sigfig = 3
  )
Code
model_results_arm <- 
  model_results |> 
  filter(factor == "Arm") |> 
  mutate(
    OR = OR |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(factor_level, OR, CI) |>
  pivot_longer(cols = -factor_level, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = factor_level, values_from = value) 
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    model_results_arm
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "OR" ~ "Ref.",
        name == "CI" ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(caption = "Table 2: Colonization by week 5 by arm and site", row_group_as_column = TRUE) |> 
  cols_width(
    "name" ~ px(200),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC-106-7` = "LC-106<br>7 days",
    `LC-106-3` = "LC-106<br>3 days",
    `LC-106-o` = "LC-106<br>overlap",
    `LC-115` = "LC-115<br>7 days",
    .fn = md
  )

Colonization by week 5 (Sensitivity analysis excluding immediately post-product visits)

Defining the alternative primary outcome

Code
new_outcome_def <- 
  # we start from the "colonized at"
  mae[["col_LBP_mg"]] |> 
  as_tibble() |> 
  dplyr::filter(.feature == "colonized_LBP_at_mg") |> 
  dplyr::left_join(
    mae@colData |> as_tibble() |> select(uid, pid, visit_code, arm),
    by = join_by(.sample == uid)
  ) |> 
  filter(!is.na(arm))

# we fill the missing visits
new_outcome_def <- 
  new_outcome_def |> 
  select(-c(arm)) |> 
  dplyr::full_join(
    new_outcome_def |> 
      select(pid, arm) |> 
      distinct() |> 
      expand_grid(
        visit_code = 
          unique(new_outcome_def$visit_code) |> sort() |> setdiff(c("0001","0010"))
        ),
    by = join_by(pid, visit_code)
  ) |> 
  arrange(pid, visit_code) |> 
  mutate(outcome = outcome |> replace_na(FALSE)) |> 
  dplyr::filter(!is.na(arm))
  
# we compute the new outcome
new_outcome_def <- 
  new_outcome_def |>  
  arrange(pid, visit_code) |> 
  group_by(pid) |> 
  mutate(
    tmp = 
      case_when(
        (arm %in% c("Pl", "LC-106-7", "LC-115")) & (as.numeric(visit_code) <= 1200) ~ FALSE,
        (arm %in% c("LC-106-3", "LC-106-o")) & (as.numeric(visit_code) < 1200) ~ FALSE,
        TRUE ~ outcome
      ),
    colonized_LBP_by_mg = cummax(tmp) |> as.logical()
  ) |> 
  ungroup()
Code
new_outcome_def |> 
  left_join(mae@colData |> as_tibble() |> select(pid, mITT) |> distinct()) |>
  ggplot() +
  aes(y = visit_code |> fct_rev(), x = pid, fill = colonized_LBP_by_mg) +
  facet_grid(. ~ arm + ifelse(mITT, "mITT", "ITT"), scales = "free", space = "free") +
  geom_tile() + 
  scale_fill_manual(
    "Colonized by LBP by MG by each visit",
    values = c("gray", "orangered"), labels = c("No", "Yes")
    ) +
  ylab("Visit code") + xlab("Participant IDs") +
  labs(
    caption = "Colonization at missed visits was considered negative."
  ) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
    legend.position = "bottom"
  )

Code
new_outcome_def <- 
  new_outcome_def |> 
  dplyr::rename(colonized_LBP_by_mg_alt = colonized_LBP_by_mg) |> 
  left_join(
    mae[["col_LBP_mg"]] |> 
      as_tibble() |> 
      dplyr::filter(.feature == "colonized_LBP_by_mg") |> 
      select(.sample, outcome) |> 
      dplyr::rename(colonized_LBP_by_mg = outcome)
  )
Code
new_outcome_def |> 
  dplyr::filter(!is.na(colonized_LBP_by_mg)) |> 
  dplyr::count(colonized_LBP_by_mg_alt, colonized_LBP_by_mg)
new_outcome_def |> 
  dplyr::filter(visit_code == "1500", !is.na(colonized_LBP_by_mg)) |>
  dplyr::count(colonized_LBP_by_mg_alt, colonized_LBP_by_mg)
Code
col_by_week5 <- 
  new_outcome_def |> 
  select(pid, visit_code, colonized_LBP_by_mg_alt) |> 
  left_join(
    mae@colData |> as_tibble() |> 
      select(uid, pid, visit_code, site, randomized, arm, ITT, mITT, PP)
    ) |> 
  dplyr::filter(randomized, mITT) |> 
  dplyr::filter(visit_code == "1500") |> 
  mutate(LBP_colonization_by_week5 = colonized_LBP_by_mg_alt |> replace_na(FALSE)) 

Visualization of the alternative primary outcome

Code
col_by_week5 |> 
  arrange(site, arm, LBP_colonization_by_week5) |>
  group_by(site, arm) |> 
  mutate(participant_number = row_number() |> factor()) |> 
  ungroup() |> 
  ggplot(aes(x = participant_number, y = site |> fct_rev(), col = site, fill = site, shape = LBP_colonization_by_week5)) +
  geom_point(size = 3) +
  facet_grid(. ~ arm, scales = "free_x", space = "free_x") +
  guides(color = "none", fill = "none") +
  scale_shape_manual("Colonized by LBP by week 5 (excluding immediately post-product visit)", values = c(1, 16)) +
  scale_color_manual(values = site_colors) + 
  xlab("Participant number") + ylab("") +
  theme(
    legend.position = "top",
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  ) 

Results

Code
summary_table <- 
  col_by_week5 |> 
  group_by(site, arm) |> 
  summarize(
    n = n(), 
    n_success = sum(LBP_colonization_by_week5),
    .groups = "drop"
  ) |> 
  mutate(
    p = n_success / n,
    CI = binom::binom.confint(n_success, n, method = "exact")
  )
Code
table_2 <- 
  summary_table |> 
  mutate(
    N_per_site_and_arm = str_c("N = ", n),
    LBP_strain_detected = str_c(n_success, " (", round(p * 100)," %)"),
    CI_text = str_c(round(CI$lower * 100), "% - ", round(CI$upper * 100), "%")
  ) |> 
  dplyr::select(
    site, arm, N_per_site_and_arm, LBP_strain_detected, CI_text
  ) |> 
  pivot_longer(-c(site, arm)) |>
  mutate(
    name = 
      case_when(
        name == "N_per_site_and_arm" ~ "N participants",
        name == "LBP_strain_detected" ~ "n (%) participants\nwith LBP strain detected",
        name == "CI_text" ~ "95% CI"
      )
  ) |> 
  pivot_wider(
    id_cols = c(site, name), 
    names_from = arm,
    values_from = value, values_fill = ""
    )
Code
table_2 |> 
  group_by(site) |> 
  gt(
    caption = "Table 1: Colonization by week 5 by arm and site (excluding immediately post-product visit)", 
     row_group_as_column = TRUE
     ) |> 
  cols_width(
    "name" ~ px(200),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    # Blinded = "All blinded arms",
    Pl = "Placebo",
    `LC-106-7` = "LC-106<br>7 days",
    `LC-106-3` = "LC-106<br>3 days",
    `LC-106-o` = "LC-106<br>overlap",
    `LC-115` = "LC-115<br>7 days",
    .fn = md
  )
Table 1: Colonization by week 5 by arm and site (excluding immediately post-product visit)
Placebo LC-106
7 days
LC-106
3 days
LC-106
overlap
LC-115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants with LBP strain detected 0 (0 %) 8 (57 %) 7 (50 %) 7 (50 %) 6 (43 %)
95% CI 0% - 23% 29% - 82% 23% - 77% 23% - 77% 18% - 71%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants with LBP strain detected 1 (20 %) 4 (57 %) 1 (100 %) 1 (100 %) 4 (67 %)
95% CI 1% - 72% 18% - 90% 3% - 100% 3% - 100% 22% - 96%

Tests

Same tests as above.

Code
arms_to_test <- col_by_week5$arm |> unique() |> setdiff("Pl")

fisher_test_results <-
  map(
  arms_to_test,
  ~ {
    arm_to_test <- .x
    tmp <- col_by_week5 |> filter(arm %in% c("Pl", arm_to_test))
    table_for_test <- table(tmp$LBP_colonization_by_week5, tmp$arm |> fct_drop())
    test_res <- fisher.test(table_for_test, alternative = "greater")
    tibble(
      arm = arm_to_test,
      p_value = test_res$p.value,
      OR = test_res$estimate,
      lower_CI = test_res$conf.int[1],
      upper_CI = test_res$conf.int[2]
    )
  }
) |> 
  bind_rows() |> 
  mutate(
    adj_p_value = p_value |> p.adjust(method = "BH")
  )
Code
test_results <- 
  fisher_test_results |> 
  mutate(
    adj_p_value = adj_p_value |> scales::pvalue(accuracy = 0.001),
    OR = OR |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(arm, adj_p_value, OR, CI) |>
  pivot_longer(cols = -arm, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = arm, values_from = value) |> 
  mutate(
    name = 
      case_when(
        name == "adj_p_value" ~ "Adj. p-value",
        name == "OR" ~ "OR",
        name == "CI" ~ "95% CI",
        TRUE ~ name
      )
  )
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    test_results
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "Adj. p-value" ~ "Ref.",
        name == "OR" ~ "",
        str_detect(name, "CI") ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(
    caption = "Table 2: Colonization by week 5 by arm and site (excluding immediately post-product visit)", 
    row_group_as_column = TRUE
    ) |> 
  cols_width(
    "name" ~ px(150),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC-106-7` = "LC-106<br>7 days",
    `LC-106-3` = "LC-106<br>3 days",
    `LC-106-o` = "LC-106<br>overlap",
    `LC-115` = "LC-115<br>7 days",
    .fn = md
  )
Table 2: Colonization by week 5 by arm and site (excluding immediately post-product visit)
Placebo LC-106
7 days
LC-106
3 days
LC-106
overlap
LC-115
7 days
CAP N participants N = 14 N = 14 N = 14 N = 14 N = 14
n (%) participants with LBP strain detected 0 (0 %) 8 (57 %) 7 (50 %) 7 (50 %) 6 (43 %)
95% CI 29% - 82% 23% - 77% 23% - 77% 18% - 71%
MGH N participants N = 5 N = 7 N = 1 N = 1 N = 6
n (%) participants with LBP strain detected 1 (20 %) 4 (57 %) 1 (100 %) 1 (100 %) 4 (67 %)
95% CI 18% - 90% 3% - 100% 3% - 100% 22% - 96%
Adj. p-value Ref. 0.002 0.002 0.002 0.002
OR 22.09 18.61 18.61 16.73
95% CI 3.23 - Inf 2.48 - Inf 2.48 - Inf 2.41 - Inf

Model

[As above, this section still needs more work given the low sample size in some of the arms. Code is not executed].

Code
col_by_week5_agg <- 
  col_by_week5 |>
  group_by(site, arm) |>
  summarise(success = LBP_colonization_by_week5 |> sum(), ntrials = n()) 

bfit <- 
  brm(
    success | trials(ntrials) ~ arm + site + site:arm, 
    data = col_by_week5_agg, 
    family = binomial("logit")
    )
Code
summary(bfit)
plot(bfit, ask = FALSE)
bind_cols(col_by_week5_agg, predict(bfit))
# loo(bfit, moment_match = TRUE, reloo = TRUE)
# pp_check(bfit)
Code
plot_res <- plot(conditional_effects(bfit), ask = FALSE)
Code
plot_res <- lapply(plot_res, function(p) {
  p + scale_color_manual(values = site_colors)
})

Reduce(`+`, plot_res) + 
  plot_layout(widths = c(2.2, 1, 3))
Code
model_results <- summary(bfit)$fixed
model_results <- 
  model_results |> 
  rownames_to_column("term") |> 
  as_tibble() |> 
  mutate(
    factor = 
      case_when(
        str_detect(term, ":site") ~ "Arm x Site",
        str_detect(term, "^site") ~ "Site",
        str_detect(term, "^arm") ~ "Arm",
        TRUE ~ "Ref.",
      ),
    factor_level = 
      term |> 
      str_remove("arm") |> str_remove("site") |> str_replace(":"," x ") |> 
      str_replace("6M", "6-") |> str_replace("CM", "C-")
  ) |> 
  mutate(
    OR = ifelse(factor == "Ref.", NA, exp(Estimate)),
    lower_CI = ifelse(factor == "Ref.", NA, exp(`l-95% CI`)),
    upper_CI = ifelse(factor == "Ref.", NA, exp(`u-95% CI`))
  ) 
Code
model_results |> 
  select(factor, factor_level, OR, lower_CI, upper_CI) |>
  gt() |> 
  gt::fmt_number(
    decimals = 2, n_sigfig = 3
  )
Code
model_results_arm <- 
  model_results |> 
  filter(factor == "Arm") |> 
  mutate(
    OR = OR |> round(2) |> as.character(),
    CI = str_c(lower_CI |> round(2), " - ", upper_CI |> round(2))
    ) |> 
  select(factor_level, OR, CI) |>
  pivot_longer(cols = -factor_level, names_to = "name", values_to = "value") |>
  pivot_wider(names_from = factor_level, values_from = value) 
Code
table_2_with_model_results <- 
  bind_rows(
    table_2,
    model_results_arm
  ) |> 
  mutate(
    site = site |> str_replace_na(""),
    Pl = 
      case_when(
        name == "OR" ~ "Ref.",
        name == "CI" ~ "",
        TRUE ~ Pl
      )
  )
Code
table_2_with_model_results |> 
  group_by(site) |>
  gt(caption = "Table 2: Colonization by week 5 by arm and site", row_group_as_column = TRUE) |> 
  cols_width(
    "name" ~ px(200),
    everything() ~ px(120)
  ) |> 
  cols_label(
    name = "",
    Pl = "Placebo",
    `LC-106-7` = "LC-106<br>7 days",
    `LC-106-3` = "LC-106<br>3 days",
    `LC-106-o` = "LC-106<br>overlap",
    `LC-115` = "LC-115<br>7 days",
    .fn = md
  )

Secondary outcomes (Figures 2-4)

Kinetics of colonization

Total relative abundances of LBP strains

Code
total_rel_ab_of_LBP <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP), !poor_quality_mg_data) |> 
  left_join(
    mae |> colData() |> as_tibble(), 
    by = join_by(uid)
  ) |>
  filter(randomized, mITT) |> 
  group_by(.sample, uid) |>
  summarize(
    total_LBP_rel_ab = sum(rel_ab, na.rm = TRUE),
    .groups = "drop"
  )

total_rel_ab_of_LBP <- 
  total_rel_ab_of_LBP |> 
  dplyr::left_join(mae |> colData() |> as_tibble(), by = join_by(uid))


total_rel_ab_of_LBP <- 
  total_rel_ab_of_LBP |> 
  left_join(
    total_rel_ab_of_LBP |> 
      select(pid, arm, site) |> 
      distinct() |> 
      group_by(arm, site) |> 
      arrange(pid) |> 
      mutate(pid_nb = row_number()) |> 
      ungroup()
  )
Code
g_tot_prop_LBP_strains <- 
  total_rel_ab_of_LBP |>
  filter(visit_type == "Clinic", !is.na(arm), PIPV) |> 
  ggplot(aes(x = study_week, y = total_LBP_rel_ab, col = arm)) +
  geom_line(aes(group = pid), alpha = 0.5) +
  geom_point(size = 0.5, alpha = 0.5) +
  facet_grid(site ~ arm) + 
  scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
  scale_y_continuous(
    "Total relative abundances\nof LBP strains", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  expand_limits(y = 0) +
  scale_color_manual(values = arm_colors) +
  guides(col = "none") +
  ggtitle(mae_title(mae)) 
Code
g_tot_prop_LBP_strains

Code
g_tot_prop_LBP_strains +
  geom_label(aes(label = pid_nb), size = 2) +
  labs(
    caption = "Participants are labeled by their enrollment order within each arm and site."
  )

Proportions of participants who colonized in each arm and each visit

Code
get_longitudinal_proportions_of_participants_who_colonized <- 
  function(total_rel_ab_of_LBP) {
    total_rel_ab_of_LBP |> 
      filter(!is.na(arm), PIPV) |> 
      group_by(site, arm, study_week) |>
      summarize(
        prop_who_colonized_at = mean(total_LBP_rel_ab > 0.05), 
        CI_low = 
          prop.test(
            x = sum(total_LBP_rel_ab > 0.05), n = n(), 
            conf.level = 0.95)$conf.int[1],
        CI_high = 
          prop.test(
            x = sum(total_LBP_rel_ab > 0.05), n = n(), 
            conf.level = 0.95)$conf.int[2],
        .groups = "drop"
      )
  }

plot_longitudinal_proportions_of_participants_who_colonized <- 
  function(total_rel_ab_of_LBP, mae){
    
    get_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_LBP) |>
      ggplot(aes(x = study_week, y = prop_who_colonized_at, col = arm)) +
      geom_ribbon(
        aes(ymin = CI_low, ymax = CI_high, fill = arm), 
        alpha = 0.1, color = NA
      ) +
      geom_line() +
      geom_point() +
      facet_grid(site ~ .) + 
      scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
      scale_y_continuous(
        "Proportion of participants\nwith detected LBP strains", labels = scales::percent,
        breaks = seq(0, 1, by = 0.2)
      ) +
      expand_limits(y = 0) +
      scale_color_manual("Arm", values = arm_colors) +
      scale_fill_manual("Arm", values = arm_colors) +
      ggtitle(mae_title(mae)) 
  }
Code
g_prop_colonized_by <- plot_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_LBP, mae)

g_prop_colonized_by

Code
g_prop_colonized_by <- 
  plot_longitudinal_proportions_of_participants_who_colonized(
    total_rel_ab_of_LBP |> filter(!((arm %in% c("LC-106-3", "LC-106-o")) & (site == "MGH"))), 
    mae
    )

g_prop_colonized_by

Code
plot_longitudinal_proportions_of_participants_who_colonized_v2 <- 
  function(total_rel_ab_of_LBP, mae){
    
    get_longitudinal_proportions_of_participants_who_colonized(total_rel_ab_of_LBP) |>
      ggplot(aes(x = study_week, y = prop_who_colonized_at, col = arm)) +
      geom_linerange(
        aes(ymin = CI_low, ymax = CI_high, col = arm), 
        alpha = 0.4, lineend = "round", linewidth = 1,
        position = position_dodge(width = 0.5)
      ) +
      geom_line(position = position_dodge(width = 0.5), linewidth = 0.8) +
      geom_point(aes(shape = arm), position = position_dodge(width = 0.5), size = 2) +
      facet_grid(site ~ .) + 
      scale_shape_manual("Arm", values = c(1, 19, 15, 17, 8)) +
      scale_x_continuous("Study weeks", breaks = c(-1:5,7,9,12), minor_breaks = 0:12) +
      scale_y_continuous(
        "Proportion of participants\nwith detected LBP strains", labels = scales::percent,
        breaks = seq(0, 1, by = 0.2)
      ) +
      expand_limits(y = 0) +
      scale_color_manual("Arm", values = arm_colors) +
      scale_fill_manual("Arm", values = arm_colors) +
      ggtitle(mae_title(mae)) 
    
  }

Alternative visualization:

Code
g_prop_colonized_by <- plot_longitudinal_proportions_of_participants_who_colonized_v2(total_rel_ab_of_LBP, mae)

g_prop_colonized_by

Code
g_prop_colonized_by <- 
  plot_longitudinal_proportions_of_participants_who_colonized_v2(
    total_rel_ab_of_LBP |> filter(!((arm %in% c("LC-106-3", "LC-106-o")) & (site == "MGH"))), 
    mae
    )

g_prop_colonized_by

Figure 2

Code
 (g_prop_colonized_by + ggtitle("")) +
  (g_tot_prop_LBP_strains + ggtitle("")) +
  plot_layout(guides = "collect", widths = c(1, 2)) +
  plot_annotation(tag_level = "a") &
  theme(legend.position = "bottom")

Microbiota trajectories

Longitudinal stack relative abundances plot:

Code
rel_abs <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  select(
    .feature, .sample, uid, rel_ab, poor_quality_mg_data,
    taxon_label, genus, LBP, strain_origin
    ) 
Code
rel_abs <- 
  rel_abs |> 
  group_by(.feature) |> 
  mutate(max_rel_ab = max(rel_ab, na.rm = TRUE), mean_rel_ab = mean(rel_ab, na.rm = TRUE)) |> 
  ungroup() |> 
  arrange(is.na(LBP), -mean_rel_ab) |> 
  mutate(.feature = .feature |> fct_inorder()) |> 
  mutate(
    taxon = 
      case_when(
        as.numeric(.feature) <= 29 ~ taxon_label,
        (genus == "Lactobacillus") ~ "Other Lactobacillus",
        TRUE ~ "Other non-Lacto."
      )
  ) |> 
  arrange(is.na(LBP), LBP, genus != "Lactobacillus", -mean_rel_ab) |> 
  mutate(taxon = taxon |> fct_inorder() |> fct_relevel("Other Lactobacillus", "Other non-Lacto.", "Other", after = Inf))
Code
rel_abs_agg <- 
  rel_abs |> 
  group_by(uid, taxon, LBP, strain_origin, poor_quality_mg_data) |> 
  summarize(rel_ab = sum(rel_ab), .groups = "drop") |> 
  dplyr::left_join(mae |> colData() |> as_tibble(), by = join_by(uid)) 
Code
g_trajectories <-
  rel_abs_agg |>
  filter(mITT, randomized != 0, PIPV, !poor_quality_mg_data) |>
  group_by(pid) |> 
  mutate(
    tot_LC115 = sum(rel_ab[LBP == "LC-115"]),
    tot_LBP = sum(rel_ab[!is.na(LBP)])
    ) |> 
  ungroup() |> 
  arrange(desc(PP), -tot_LC115, -tot_LBP) |> 
  mutate(
    pid = pid |> fct_inorder(),
    visit_label = ifelse(visit != "screening", str_c(visit, " (W", study_week, ")"), visit |> as.character()) |> fct_reorder(visit |> as.numeric()),
    ) |> 
  ggplot(aes(y = rel_ab, x = pid, fill = taxon, alpha = ifelse(PP, "PP", "mITT"))) +
  geom_col() +
  facet_grid(visit_label ~ arm + site, scales = "free", space = "free") +
  scale_fill_manual(
    values = get_taxa_colors(rel_abs_agg$taxon |> levels()),
    labels = get_taxa_labels(rel_abs_agg$taxon |> levels()),
    guide = guide_legend(ncol = 1)
    ) +
  scale_alpha_discrete("Population", range = c(0.5, 1)) +
  scale_y_continuous(
    "relative abundance", labels = scales::percent,
    breaks = seq(0, 0.5, by = 0.5), minor_breaks = seq(0, 1, by = 0.25)
  ) +
  scale_x_discrete("Participants") +
  theme(
    strip.text.y = element_text(angle = 0, hjust = 0),
    strip.text.x = element_text(angle = 90, hjust = 0),
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  )
Code
g_trajectories

Microbiota trajectories of mITT participants in the VIBRANT study. The relative abundances of the LBP strains and top taxa are shown. Participants are ordered by their total relative abundance of LBP strains, and the study weeks are shown in the rows. The alpha level (transparency) indicates whether the participant is part of the per-protocol (PP) or modified intention-to-treat (mITT) population.
Code
# checking the data for the placebo participant that has the LBP strains at MGH

mae[["mg"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", study_week == 5) 


mae[["qPCR"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", study_week == 5) 

mae[["amplicon"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", study_week == 5) |> 
  select(.feature, .sample, rel_ab, exclude_sample)


mae[["amplicon"]] |> 
  as_tibble() |> 
  dplyr::left_join(mae@colData |> as_tibble()) |> 
  filter(pid == "068100062", .feature == "Lactobacillus_crispatus") |> 
  select(.feature, .sample, visit_code, rel_ab, exclude_sample)

Individual LBP strains by sites

Code
LBP_strains <- 
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP)) |> 
  select(.feature, .sample, uid, rel_ab, LBP, strain_origin, poor_quality_mg_data) |> 
  left_join(mae |> colData() |> as_tibble()) |> 
  filter(mITT, !poor_quality_mg_data, randomized)
Code
LBP_strains |> 
  filter(!is.na(arm), !is.na(study_week), study_week >= 2) |> 
  ggplot() +
  aes(x = .feature, y = rel_ab, col = site) +
  # geom_point(alpha = 0.5, size = 0.5, position = position_dodge(width = 0.5)) +
  ggbeeswarm::geom_quasirandom(alpha = 0.5, size = 0.5, dodge.width = 0.5) +
  # geom_boxplot(outlier.size = 0.5) +
  facet_grid(study_week ~ LBP + strain_origin , scales = "free_x", space = "free_x") +
  xlab("LBP strain") +
  scale_y_continuous(
    "Relative abundance bacteria", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  scale_color_manual(values = site_colors) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  ) 

Code
LBP_strains |> 
  filter(!is.na(arm), !is.na(study_week), study_week >= 2) |> 
  ggplot() +
  aes(x = .feature, y = rel_ab, col = site, fill = site) +
  # geom_point(alpha = 0.5, size = 0.5) +
  geom_boxplot(outlier.size = 0.5, alpha = 0.4) +
  facet_grid(study_week ~ LBP + strain_origin , scales = "free_x", space = "free_x") +
  xlab("LBP strain") +
  scale_y_continuous(
    "Relative abundance bacteria", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  scale_color_manual(values = site_colors) +
  scale_fill_manual(values = site_colors) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  ) 

Code
LBP_strains |> 
  filter(!is.na(arm), !is.na(study_week), study_week >= 2) |> 
  ggplot() +
  aes(x = study_week, y = rel_ab, col = strain_origin) +
  geom_point(alpha = 0.5, size = 0.5) +
  geom_line(aes(group = pid), alpha = 0.5) +
  facet_grid(site ~ LBP |> str_replace("&", "&\n") + strain_origin + .feature , scales = "free_x", space = "free_x") +
  xlab("Study week") +
  scale_y_continuous(
    "Relative abundance bacteria", labels = scales::percent,
    breaks = seq(0, 1, by = 0.2)
  ) +
  #scale_color_manual(values = colorspace::darken(scale_colour_hue()$palette(2), amount = 0.4)) +
  scale_color_manual("Strain origin", values = strain_origin_colors) +
  scale_x_continuous("Study weeks", breaks = c(0:5,7,9,12), minor_breaks = 0:12) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)
  ) 

Proportion colonized among exposed

Code
tmp <- 
  # we use the metagenomics data for this
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP), !poor_quality_mg_data) |> 
  select(.feature, LBP, strain_origin, .sample, uid, rel_ab) |> 
  # add the arm, mITT, etc.
  left_join(
    mae |> colData() |> as_tibble() |> select(uid, pid, site, randomized, mITT, arm, visit_code, visit, PIPV), 
    by = join_by(uid)
  ) |>
  # only consider the mITT
  filter(randomized, mITT) |> 
  filter(PIPV, as.numeric(visit_code) >= 1200) |> # only include visits after the first 2 weeks
  # define the outcome of interest for each person and strain
  group_by(.feature, LBP, strain_origin, pid, arm, site) |>
  summarize(
    ever_colonized = any(rel_ab >= 0.05, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  # exclude Placebo participants because, they were, in theory, not exposed to any of the strains.
  filter(arm != "Pl") |> 
  # define the total number of participants exposed for each strain.
  # For LC-115 strains, this is only participants of arm LC-115
  # For LC-106 strains, this is participants from all active arms.
  mutate(
    exposed = case_when(
      LBP == "LC-115" ~ (arm == "LC-115"),
      LBP == "LC-106 & LC-115" ~ TRUE,
      TRUE ~ FALSE
    )
  ) |> 
  # we compute the statistics of interest
  group_by(.feature, LBP, strain_origin, site) |> 
  summarize(
    n_exposed = sum(exposed, na.rm = TRUE),
    n_colonized = sum(ever_colonized & exposed, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  mutate(
    prop_colonized = n_colonized / n_exposed,
    CI_low = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$lower,
    CI_high = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$upper
  )
Code
tmp |> 
  ggplot() +
  aes(x = .feature, y = prop_colonized, col = site) +
  facet_grid(. ~ LBP + strain_origin, scales = "free", space = "free") +
  geom_errorbar(
    aes(ymin = CI_low, ymax = CI_high), 
    linewidth = 1, alpha = 0.7,
    width = 0.3, position = position_dodge(width = 0.5)
    ) +
  geom_point(position = position_dodge(width = 0.5), size = 2) +
  scale_color_manual("Study site", values = site_colors) +
  xlab("LBP strain") + 
  ylab("Proportion of participants colonized\namong those exposed") +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Code
tmp |> 
  mutate(
    CI = str_c(
      "[", 
      scales::percent(CI_low, accuracy = 1), 
      " - ", 
      scales::percent(CI_high, accuracy = 1), 
      "]"
    ),
    value = 
      str_c(
        scales::percent(prop_colonized, accuracy = 1), 
        " (", n_colonized, "/", n_exposed,")<br>",
       CI
      ),
    strain_info = str_c(LBP, "<br>(", strain_origin, " strains)")
  ) |> 
  select(-n_exposed, -n_colonized, -prop_colonized, -CI_low, -CI_high, -CI, -LBP, -strain_origin) |> 
  pivot_wider(
    names_from = c(site), 
    values_from = value
  ) |>
  arrange(strain_info) |> 
  group_by(strain_info) |> 
  gt(
    row_group_as_column = TRUE, 
    process_md = TRUE,
    caption = "Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure."
  ) |> 
  cols_label(.feature = "LBP strain") |> 
  cols_align(columns = .feature, align = "left") |> 
  cols_align(columns = c("CAP", "MGH"), align = "center") |>
  fmt_markdown(columns = everything()) 
Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure.
LBP strain CAP MGH
LC-106 & LC-115
(SA strains)
FF00018 7% (4/56)
[3% - 17%]
20% (3/15)
[7% - 45%]
FF00051 30% (17/56)
[20% - 43%]
67% (10/15)
[42% - 85%]
UC101 25% (14/56)
[16% - 38%]
73% (11/15)
[48% - 89%]
LC-106 & LC-115
(US strains)
C0022A1 25% (14/56)
[16% - 38%]
47% (7/15)
[25% - 70%]
C0059E1 61% (34/56)
[48% - 72%]
87% (13/15)
[62% - 96%]
C0175A1 39% (22/56)
[28% - 52%]
67% (10/15)
[42% - 85%]
LC-115
(SA strains)
122010 57% (8/14)
[33% - 79%]
33% (2/6)
[10% - 70%]
185329 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
FF00004 50% (7/14)
[27% - 73%]
83% (5/6)
[44% - 97%]
FF00064 50% (7/14)
[27% - 73%]
83% (5/6)
[44% - 97%]
FF00072 7% (1/14)
[1% - 31%]
50% (3/6)
[19% - 81%]
UC119 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
LC-115
(US strains)
C0006A1 50% (7/14)
[27% - 73%]
67% (4/6)
[30% - 90%]
C0028A1 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
C0112A1 64% (9/14)
[39% - 84%]
83% (5/6)
[44% - 97%]

Excluding immediate post-product visit

Code
tmp <- 
  # we use the metagenomics data for this
  mae[["mg"]] |> 
  as_tibble() |> 
  filter(!is.na(LBP), !poor_quality_mg_data) |> 
  select(.feature, LBP, strain_origin, .sample, uid, rel_ab) |> 
  # add the arm, mITT, etc.
  left_join(
    mae |> colData() |> as_tibble() |> 
      select(uid, pid, site, randomized, mITT, arm, visit_code, visit, PIPV), 
    by = join_by(uid)
  ) |>
  # only consider the mITT
  filter(randomized, mITT) |> 
  mutate(
    include_visit = 
      case_when(
        (arm %in% c("LC-106-7", "LC-115")) & as.numeric(visit_code) > 1200 ~ TRUE,
        (arm %in% c("LC-106-3", "LC-106-o")) & as.numeric(visit_code) >= 1200 ~TRUE,
        TRUE ~ FALSE
      )
  ) |> 
  filter(PIPV, include_visit) |> # only include visits after the first 2 weeks
  # define the outcome of interest for each person and strain
  group_by(.feature, LBP, strain_origin, pid, arm, site) |>
  summarize(
    ever_colonized = any(rel_ab >= 0.05, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  # exclude Placebo participants because, they were, in theory, not exposed to any of the strains.
  filter(arm != "Pl") |> 
  # define the total number of participants exposed for each strain.
  # For LC-115 strains, this is only participants of arm LC-115
  # For LC-106 strains, this is participants from all active arms.
  mutate(
    exposed = case_when(
      LBP == "LC-115" ~ (arm == "LC-115"),
      LBP == "LC-106 & LC-115" ~ TRUE,
      TRUE ~ FALSE
    )
  ) |> 
  # we compute the statistics of interest
  group_by(.feature, LBP, strain_origin, site) |> 
  summarize(
    n_exposed = sum(exposed, na.rm = TRUE),
    n_colonized = sum(ever_colonized & exposed, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  mutate(
    prop_colonized = n_colonized / n_exposed,
    CI_low = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$lower,
    CI_high = binom::binom.confint(n_colonized, n_exposed, method = "wilson")$upper
  )
Code
tmp |> 
  ggplot() +
  aes(x = .feature, y = prop_colonized, col = site) +
  facet_grid(. ~ LBP + strain_origin, scales = "free", space = "free") +
  geom_errorbar(
    aes(ymin = CI_low, ymax = CI_high), 
    linewidth = 1, alpha = 0.7,
    width = 0.3, position = position_dodge(width = 0.5)
    ) +
  geom_point(position = position_dodge(width = 0.5), size = 2) +
  scale_color_manual("Study site", values = site_colors) +
  xlab("LBP strain") + 
  ylab("Proportion of participants colonized\namong those exposed") +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Code
tmp |> 
  mutate(
    CI = str_c(
      "[", 
      scales::percent(CI_low, accuracy = 1), 
      " - ", 
      scales::percent(CI_high, accuracy = 1), 
      "]"
    ),
    value = 
      str_c(
        scales::percent(prop_colonized, accuracy = 1), 
        " (", n_colonized, "/", n_exposed,")<br>",
       CI
      ),
    strain_info = str_c(LBP, "<br>(", strain_origin, " strains)")
  ) |> 
  select(-n_exposed, -n_colonized, -prop_colonized, -CI_low, -CI_high, -CI, -LBP, -strain_origin) |> 
  pivot_wider(
    names_from = c(site), 
    values_from = value
  ) |>
  arrange(strain_info) |> 
  group_by(strain_info) |> 
  gt(
    row_group_as_column = TRUE, 
    process_md = TRUE,
    caption = "Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure but excluding immediate post-product visit (i.e., excluding week 2 visit for the LC-106-7 and LC-115 arms)."
  ) |> 
  cols_label(.feature = "LBP strain") |> 
  cols_align(columns = .feature, align = "left") |> 
  cols_align(columns = c("CAP", "MGH"), align = "center") |>
  fmt_markdown(columns = everything()) 
Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure but excluding immediate post-product visit (i.e., excluding week 2 visit for the LC-106-7 and LC-115 arms).
LBP strain CAP MGH
LC-106 & LC-115
(SA strains)
FF00018 2% (1/56)
[0% - 9%]
7% (1/15)
[1% - 30%]
FF00051 9% (5/56)
[4% - 19%]
20% (3/15)
[7% - 45%]
UC101 2% (1/56)
[0% - 9%]
7% (1/15)
[1% - 30%]
LC-106 & LC-115
(US strains)
C0022A1 21% (12/56)
[13% - 34%]
33% (5/15)
[15% - 58%]
C0059E1 43% (24/56)
[31% - 56%]
40% (6/15)
[20% - 64%]
C0175A1 39% (22/56)
[28% - 52%]
67% (10/15)
[42% - 85%]
LC-115
(SA strains)
122010 36% (5/14)
[16% - 61%]
17% (1/6)
[3% - 56%]
185329 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
FF00004 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
FF00064 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
FF00072 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
UC119 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
LC-115
(US strains)
C0006A1 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
C0028A1 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]
C0112A1 0% (0/14)
[0% - 22%]
0% (0/6)
[0% - 39%]